-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add ujson support in pandas.io.json #3804
Conversation
yay! |
This is pretty awesome. One thing I think worth being explicit in the docs (am I right in saying this?) only works with valid JSON. |
the json routines read/write from strings is this typical of dealing with JSON data? should we have a kw to do this? always do it? |
@jreback That is an excellent point, this should work as all the other (The first thing I did was open a json_file It'd certainly be a useful feature is we could go We could either:
(Also, to clarify previous point, from_json only reads valid json :) ) |
do u have a URL that yields JSON? |
A more interesting example: https://api.github.com/repos/pydata/pandas/issues?per_page=100 |
parsed first try!
|
The one thing that trips is dates (you just have to to_datetime after), but that can be left for another day. Whoop! :) |
yeh....we'll see how this goes....in 0.12 can add |
i wonder if there are any other similar libraries or systems that have this much io functionality in a single package... |
(Does infer_types work for unix time stamps? ...to get the roundtrip working. Anyway....) |
doubtful since those are just integers...but i haven't tested |
i tried date +"%s" | python -c 'import sys; from dateutil.parser import parse; parse(sys.stdin.read())' that doesn't work so i'm going to say no it won't work. |
You can do |
but there is an issue because sometimes they are not in seconds...so have to disambiguate |
Just saying as |
oh yes |
could add a |
|
(obviously far too reckless to just |
what's a quick way to fix inconcistent space/tabs....something got screwed up... |
M-x untabify |
on a region i think prolly whole file works too |
@hayd actually try out: |
Quite slow though? |
convert_objects is ok speed wise it operates on blocks using cython functions so it's gotta be faster than lambdas :) |
@jreback correct me if i'm wrong here... |
@Komnomnomnom go ahead and paste here |
Ok the following patch should make it safe to call diff --git a/pandas/src/ujson/python/JSONtoObj.c b/pandas/src/ujson/python/JSONtoObj.c
index 1db7586..160c30f 100644
--- a/pandas/src/ujson/python/JSONtoObj.c
+++ b/pandas/src/ujson/python/JSONtoObj.c
@@ -10,6 +10,7 @@ typedef struct __PyObjectDecoder
JSONObjectDecoder dec;
void* npyarr; // Numpy context buffer
+ void* npyarr_addr; // Ref to npyarr ptr to track DECREF calls
npy_intp curdim; // Current array dimension
PyArray_Descr* dtype;
@@ -67,9 +68,7 @@ void Npy_releaseContext(NpyArrContext* npyarr)
}
if (npyarr->dec)
{
- // Don't set to null, used to make sure we don't Py_DECREF npyarr
- // in releaseObject
- // npyarr->dec->npyarr = NULL;
+ npyarr->dec->npyarr = NULL;
npyarr->dec->curdim = 0;
}
Py_XDECREF(npyarr->labels[0]);
@@ -88,6 +87,7 @@ JSOBJ Object_npyNewArray(void* _decoder)
{
// start of array - initialise the context buffer
npyarr = decoder->npyarr = PyObject_Malloc(sizeof(NpyArrContext));
+ decoder->npyarr_addr = npyarr;
if (!npyarr)
{
@@ -515,7 +515,7 @@ JSOBJ Object_newDouble(double value)
static void Object_releaseObject(JSOBJ obj, void* _decoder)
{
PyObjectDecoder* decoder = (PyObjectDecoder*) _decoder;
- if (obj != decoder->npyarr)
+ if (obj != decoder->npyarr_addr)
{
Py_XDECREF( ((PyObject *)obj));
}
@@ -555,6 +555,7 @@ PyObject* JSONToObj(PyObject* self, PyObject *args, PyObject *kwargs)
pyDecoder.dec = dec;
pyDecoder.curdim = 0;
pyDecoder.npyarr = NULL;
+ pyDecoder.npyarr_addr = NULL;
decoder = (JSONObjectDecoder*) &pyDecoder;
@@ -609,6 +610,7 @@ PyObject* JSONToObj(PyObject* self, PyObject *args, PyObject *kwargs)
if (PyErr_Occurred())
{
+ Npy_releaseContext(pyDecoder.npyarr);
return NULL;
} |
DOC: docs in io.rst/whatsnew/release notes/api TST: cleaned up cruft in test_series/test_frame
…will return a StringIO object) read_json will read from a string-like or filebuf or url (consistent with other parsers)
…or JSON string added keywords parse_dates,keep_default_dates to allow for date parsing in columns of a Frame (default is False, not to parse dates)
…(which both can be can be parsed with parse_dates=True in read_json)
patch applied.....looking good now |
@jreback Something like this for requests: hayd@dbd968b |
@wesm this is mergable....any objections? |
Looks good to me, bombs away |
3.2.1..... |
ENH: add ujson support in pandas.io.json
Awesome. I'll see about merging in upstream changes. Will send thru a pull request soonish. |
oh...you have additional dependencies on this? |
ok...sure... |
thanks all for making this happen, especially to @Komnomnomnom for authoring this code in the first place =) |
This is @wesm PR #3583 with this:
It builds now, and passes travis on py2 and py3, had 2 issues:
Converted to new io API:
to_json
/read_json
Docs added